Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset
نویسندگان
چکیده
MOTIVATION Immunoglobulin heavy chain genes are formed by recombination of genes randomly selected from sets of IGHV, IGHD and IGHJ genes. Utilities have been developed to identify genes that contribute to observed VDJ rearrangements, but in the absence of datasets of known rearrangements, the evaluation of these utilities is problematic. We have analyzed thousands of VDJ rearrangements from an individual (S22) whose IGHV, IGHD and IGHJ genotype can be inferred from the dataset. Knowledge of this genotype means that the Stanford_S22 dataset can serve to benchmark the performance of IGH alignment utilities. RESULTS We evaluated the performance of seven utilities. Failure to partition a sequence into genes present in the S22 genome was considered an error, and error rates for different utilities ranged from 7.1% to 13.7%. AVAILABILITY Supplementary data includes the S22 genotypes and alignments. The Stanford_S22 dataset and an evaluation tool is available at http://www.emi.unsw.edu.au/~ihmmune/IGHUtilityEval/.
منابع مشابه
gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملSupplementary information: Lacking alignments? The next-generation sequencing mapper segemehl revisited
Each artificial dataset consists of 100 000 singleor paired-end reads and was simulated using Mason v0.1.1 [1] from the Human genome (hg19), excluding haplotypes, random contigs, and ‘non-chromosomal’ sequences. For the single-end Illumina datasets, Mason was run in Illumina mode with parameters -hn2, -sq, and the read length (-n) set to 100 and 30 for long and short reads, respectively. For th...
متن کاملGenetic variations of avian Pasteurella multocida as demonstrated by 16S-23S rRNA gene sequences comparison
Pasteurella multocida is known as an important heterogenic bacterial agent causes some severe diseases such as fowl cholera in poultry and haemorrhagic septicaemia in cattle and buffalo. A polymerase chain reaction (PCR) assay was developed using primers derived from conserved part of 16S-23S rRNA gene. The PCR amplified a fragment size of 0.7 kb using DNA from nine avian P. multocida isolates...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملConstruction and cloning of a recombinant expression vector containing human Cd20 Gene for antibody therapy in Non-Hodgkin Lymphoma
ABSTRACT Introduction: Non-Hodgkin lymphoma (NHL) is a cancer that starts in lymphocytes. The main treatment for NHL is chemotherapy and radiation. Today immunotherapy is a promising therapeutic approach in the treatment of a variety cancers which is high specific unlike previous methods. Antibodies do not penetrate effectively into tumore tissues because of their large size. Whe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 26 24 شماره
صفحات -
تاریخ انتشار 2010